sdg method
TAMIS: Tailored Membership Inference Attacks on Synthetic Data
Andrey, Paul, Bars, Batiste Le, Tommasi, Marc
Membership Inference Attacks (MIA) enable to empirically assess the privacy of a machine learning algorithm. In this paper, we propose TAMIS, a novel MIA against differentially-private synthetic data generation methods that rely on graphical models. This attack builds upon MAMA-MIA, a recently-published state-of-the-art method. It lowers its computational cost and requires less attacker knowledge. Our attack is the product of a two-fold improvement. First, we recover the graphical model having generated a synthetic dataset by using solely that dataset, rather than shadow-modeling over an auxiliary one. This proves less costly and more performant. Second, we introduce a more mathematically-grounded attack score, that provides a natural threshold for binary predictions. In our experiments, TAMIS achieves better or similar performance as MAMA-MIA on replicas of the SNAKE challenge.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)
Could Synthetic Data Be the Future of Data Sharing? - CPO Magazine
Synthetic data generation (SDG) is rapidly emerging as a practical privacy enhancing technology (PET) for sharing data for secondary purposes. It does so by generating non-identifiable datasets that can be used and disclosed without the legislative need for additional consent given that these datasets would not be considered personal information. Having worked in the privacy and data anonymization space for over 15 years, the limitations of traditional de-identification methods are becoming more evident. This creates room for modern PETs that can enable the responsible processing of data for secondary purposes. There's a growing appetite from CPOs to understand where SDG fits as a PET, how it's generated, what problems it can solve, as well as how laws and regulations apply. In a nutshell, synthetic data is generated from real data.